Balanced Binary Tree (Practice Interview Question)

You're out of free questions.

Write a function to see if a binary tree ↴

A binary tree is a tree where every node has two or fewer children. The children are usually called left and right.

class BinaryTreeNode(object): def __init__(self, value): self.value = value self.left = None self.right = None

This lets us build a structure like this:

That particular example is special because every level of the tree is completely full. There are no "gaps." We call this kind of tree "perfect."

Binary trees have a few interesting properties when they're perfect:

Property 1: the number of total nodes on each "level" doubles as we move down the tree.

Property 2: the number of nodes on the last level is equal to the sum of the number of nodes on all other levels (plus 1). In other words, about half of our nodes are on the last level.

Let's call the number of nodes $n$ , and the height of the tree $h$ . $h$ can also be thought of as the "number of levels."

If we had $h$ , how could we calculate $n$ ?

Let's just add up the number of nodes on each level! How many nodes are on each level?

If we zero-index the levels, the number of nodes on the $x$ th level is exactly $2^x$ .

Level $0$ : $2^0$ nodes,

Level $1$ : $2^1$ nodes,

Level $2$ : $2^2$ nodes,

Level $3$ : $2^3$ nodes,

etc

So our total number of nodes is:
$n = 2^0 + 2^1 + 2^2 + 2^3 + ... + 2^{h-1}$
Why only up to $2^{h-1}$ ? Notice that we started counting our levels at 0. So if we have $h$ levels in total, the last level is actually the " $h-1$ "-th level. That means the number of nodes on the last level is $2^{h-1}$ .

But we can simplify. Property 2 tells us that the number of nodes on the last level is (1 more than) half of the total number of nodes, so we can just take the number of nodes on the last level, multiply it by 2, and subtract 1 to get the number of nodes overall. We know the number of nodes on the last level is $2^{h-1}$ , So:
$n = 2^{h-1} * 2 - 1$ $n = 2^{h-1} * 2^1 - 1$ $n = 2^{h-1+1}- 1$ $n = 2^{h} - 1$
So that's how we can go from $h$ to $n$ . What about the other direction?

We need to bring the $h$ down from the exponent. That's what logs are for!

First, some quick review. $\log_{10} (100)$ simply means, "What power must you raise 10 to in order to get 100?". Which is 2, because $10^2 = 100$ .

We can use logs in algebra to bring variables down from exponents by exploiting the fact that we can simplify $\log_{10}(10^2)$ . What power must we raise $10$ to in order to get $10^2$ ? That's easy—it's $2$ .

So in this case we can take the $\log_{2}$ of both sides:
$n = 2^{h} - 1$ $n + 1 = 2^{h}$ $\log_{2}{((n+1))} = \log_{2}{(2^{h})}$ $\log_{2}{(n+1)} = h$
So that's the relationship between height and total nodes in a perfect binary tree.

is "superbalanced" (a new tree property we just made up).

A tree is "superbalanced" if the difference between the depths of any two leaf nodes ↴

A leaf node is a tree node with no children.

It's the "end" of a path to the bottom, from the root.

is no greater than one.

Here's a sample binary tree node class:

  class BinaryTreeNode(object):

    def __init__(self, value):
        self.value = value
        self.left  = None
        self.right = None

    def insert_left(self, value):
        self.left = BinaryTreeNode(value)
        return self.left

    def insert_right(self, value):
        self.right = BinaryTreeNode(value)
        return self.right

Gotchas

Your first thought might be to write a recursive function, thinking, "the tree is balanced if the left subtree is balanced and the right subtree is balanced." This kind of approach works well for some other tree problems.

But this isn't quite true. Counterexample: suppose that from the root of our tree:

The left subtree only has leaves at depths 10 and 11.
The right subtree only has leaves at depths 11 and 12.

Both subtrees are balanced, but from the root we will have leaves at 3 different depths.

We could instead have our recursive function get the list of distinct leaf depths for each subtree. That could work fine. But let's come up with an iterative solution instead. It's usually better to use an iterative solution instead of a recursive one because it avoids stack overflow. ↴

Overview

The call stack is what a program uses to keep track of function calls. The call stack is made up of stack frames—one for each function call.

For instance, say we called a function that rolled two dice and printed the sum.

  def roll_die():
    return random.randint(1, 6)

def roll_two_and_sum():
    total = 0
    total += roll_die()
    total += roll_die()
    print(total)

roll_two_and_sum()

First, our program calls roll_two_and_sum(). It goes on the call stack:

roll_two_and_sum()

That function calls roll_die(), which gets pushed on to the top of the call stack:

roll_die()

roll_two_and_sum()

Inside of roll_die(), we call random.randint(). Here's what our call stack looks like then:

random.randint()

roll_die()

roll_two_and_sum()

When random.randint() finishes, we return back to roll_die() by removing ("popping") random.randint()'s stack frame.

roll_die()

roll_two_and_sum()

Same thing when roll_die() returns:

roll_two_and_sum()

We're not done yet! roll_two_and_sum() calls roll_die() again:

roll_die()

roll_two_and_sum()

Which calls random.randint() again:

random.randint()

roll_die()

roll_two_and_sum()

random.randint() returns, then roll_die() returns, putting us back in roll_two_and_sum():

roll_two_and_sum()

Which calls print()():

print()()

roll_two_and_sum()

What's stored in a stack frame?

What actually goes in a function's stack frame?

A stack frame usually stores:

Local variables
Arguments passed into the function
Information about the caller's stack frame
The return address—what the program should do after the function returns (i.e.: where it should "return to"). This is usually somewhere in the middle of the caller's code.

Some of the specifics vary between processor architectures. For instance, AMD64 (64-bit x86) processors pass some arguments in registers and some on the call stack. And, ARM processors (common in phones) store the return address in a special register instead of putting it on the call stack.

The Space Cost of Stack Frames

Each function call creates its own stack frame, taking up space on the call stack. That's important because it can impact the space complexity of an algorithm. Especially when we use recursion.

For example, if we wanted to multiply all the numbers between $1$ and $n$ , we could use this recursive approach:

  def product_1_to_n(n):
    return 1 if n <= 1 else n * product_1_to_n(n - 1)

What would the call stack look like when n = 10?

First, product_1_to_n() gets called with n = 10:

    product_1_to_n()    n = 10

This calls product_1_to_n() with n = 9.

    product_1_to_n()    n = 9

    product_1_to_n()    n = 10

Which calls product_1_to_n() with n = 8.

    product_1_to_n()    n = 8

    product_1_to_n()    n = 9

    product_1_to_n()    n = 10

And so on until we get to n = 1.

    product_1_to_n()    n = 1

    product_1_to_n()    n = 2

    product_1_to_n()    n = 3

    product_1_to_n()    n = 4

    product_1_to_n()    n = 5

    product_1_to_n()    n = 6

    product_1_to_n()    n = 7

    product_1_to_n()    n = 8

    product_1_to_n()    n = 9

    product_1_to_n()    n = 10

Look at the size of all those stack frames! The entire call stack takes up $O(n)$ space. That's right—we have an $O(n)$ space cost even though our function itself doesn't create any data structures!

What if we'd used an iterative approach instead of a recursive one?

  def product_1_to_n(n):
    # We assume n >= 1
    result = 1
    for num in range(1, n + 1):
        result *= num

    return result

This version takes a constant amount of space. At the beginning of the loop, the call stack looks like this:

    product_1_to_n()    n = 10, result = 1, num = 1

As we iterate through the loop, the local variables change, but we stay in the same stack frame because we don't call any other functions.

    product_1_to_n()    n = 10, result = 2, num = 2

    product_1_to_n()    n = 10, result = 6, num = 3

    product_1_to_n()    n = 10, result = 24, num = 4

In general, even though the compiler or interpreter will take care of managing the call stack for you, it's important to consider the depth of the call stack when analyzing the space complexity of an algorithm.

Be especially careful with recursive functions! They can end up building huge call stacks.

What happens if we run out of space? It's a stack overflow! In Python 3.6, you'll get a RecursionError.

If the very last thing a function does is call another function, then its stack frame might not be needed any more. The function could free up its stack frame before doing its final call, saving space.

This is called tail call optimization (TCO). If a recursive function is optimized with TCO, then it may not end up with a big call stack.

In general, most languages don't provide TCO. Scheme is one of the few languages that guarantee tail call optimization. Some Ruby, C, and Javascript implementations may do it. Python and Java decidedly don't.

We can do this in $O(n)$ time and $O(n)$ space.

What about a tree with only one leaf node? Does your function handle that case properly?

Breakdown

Sometimes it's good to start by rephrasing or "simplifying" the problem.

The requirement of "the difference between the depths of any two leaf nodes is no greater than 1" implies that we'll have to compare the depths of all possible pairs of leaves. That'd be expensive—if there are $n$ leaves, there are $O(n^2)$ possible pairs of leaves.

But we can simplify this requirement to require less work. For example, we could equivalently say:

"The difference between the min leaf depth and the max leaf depth is 1 or less"
"There are at most two distinct leaf depths, and they are at most 1 apart"

If you're having trouble with a recursive approach, try using an iterative one.

To get to our leaves and measure their depths, we'll have to traverse the tree somehow. What methods do we know for traversing a tree?

Depth-first ↴

Depth-first search (DFS) is a method for exploring a tree or graph. In a DFS, you go as deep as possible down one path before backing up and trying a different one.

Depth-first search is like walking through a corn maze. You explore one path, hit a dead end, and go back and try a different one.

Here's a how a DFS would traverse this tree, starting with the root:

We'd go down the first path we find until we hit a dead end:

The same binary tree with all nodes in the leftmost branch bolded after being visited.

Then we'd do the same thing again—go down a path until we hit a dead end:

Then we do the same thing again: head down the next leftmost path until we reach a dead end.

And again:

Until we've visited every node in the tree.

Until we reach the end.

Depth-first search is often compared with breadth-first search.

Advantages:

Depth-first search on a binary tree generally requires less memory than breadth-first.
Depth-first search can be easily implemented with recursion.

Disadvantages

A DFS doesn't necessarily find the shortest path to a node, while breadth-first search does.

and breadth-first ↴

Breadth-first search (BFS) is a method for exploring a tree or graph. In a BFS, you first explore all the nodes one step away, then all the nodes two steps away, etc.

Breadth-first search is like throwing a stone in the center of a pond. The nodes you explore "ripple out" from the starting point.

Here's a how a BFS would traverse this tree, starting with the root:

We'd visit all the immediate children (all the nodes that're one step away from our starting node):

The same 4-row binary tree with all nodes at depth 1 (second row) bolded after being visited.

Then we'd move on to all those nodes' children (all the nodes that're two steps away from our starting node):

The same 4-row binary tree with all nodes at depth 2 (third row) bolded after being visited.

And so on:

The same 4-row binary tree with all nodes at depth 3 (fourth and final row) bolded after being visited.

Until we reach the end.

Breadth-first search is often compared with depth-first search.

Advantages:

A BFS will find the shortest path between the starting point and any other reachable node. A depth-first search will not necessarily find the shortest path.

Disadvantages

A BFS on a binary tree generally requires more memory than a DFS.

are common ways to traverse a tree. Which one should we use here?

The worst-case time and space costs of both are the same—you could make a case for either.

But one characteristic of our algorithm is that it could short-circuit and return False as soon as it finds two leaves with depths more than 1 apart. So maybe we should use a traversal that will hit leaves as quickly as possible...

Depth-first traversal will generally hit leaves before breadth-first, so let's go with that. How could we write a depth-first walk that also keeps track of our depth?

Solution

We do a depth-first walk ↴

Depth-first search (DFS) is a method for exploring a tree or graph. In a DFS, you go as deep as possible down one path before backing up and trying a different one.

Depth-first search is like walking through a corn maze. You explore one path, hit a dead end, and go back and try a different one.

Here's a how a DFS would traverse this tree, starting with the root:

We'd go down the first path we find until we hit a dead end:

Then we'd do the same thing again—go down a path until we hit a dead end:

And again:

Until we reach the end.

Depth-first search is often compared with breadth-first search.

Advantages:

Depth-first search on a binary tree generally requires less memory than breadth-first.
Depth-first search can be easily implemented with recursion.

Disadvantages

A DFS doesn't necessarily find the shortest path to a node, while breadth-first search does.

through our tree, keeping track of the depth as we go. When we find a leaf, we add its depth to a list of depths if we haven't seen that depth already.

Each time we hit a leaf with a new depth, there are two ways that our tree might now be unbalanced:

There are more than 2 different leaf depths
There are exactly 2 leaf depths and they are more than 1 apart.

Why are we doing a depth-first walk and not a breadth-first ↴

Breadth-first search (BFS) is a method for exploring a tree or graph. In a BFS, you first explore all the nodes one step away, then all the nodes two steps away, etc.

Breadth-first search is like throwing a stone in the center of a pond. The nodes you explore "ripple out" from the starting point.

Here's a how a BFS would traverse this tree, starting with the root:

We'd visit all the immediate children (all the nodes that're one step away from our starting node):

Then we'd move on to all those nodes' children (all the nodes that're two steps away from our starting node):

And so on:

Until we reach the end.

Breadth-first search is often compared with depth-first search.

Advantages:

A BFS will find the shortest path between the starting point and any other reachable node. A depth-first search will not necessarily find the shortest path.

Disadvantages

A BFS on a binary tree generally requires more memory than a DFS.

one? You could make a case for either. We chose depth-first because it reaches leaves faster, which allows us to short-circuit earlier in some cases.

  def is_balanced(tree_root):

    # A tree with no nodes is superbalanced, since there are no leaves!
    if tree_root is None:
        return True

    # We short-circuit as soon as we find more than 2
    depths = []

    # We'll treat this list as a stack that will store tuples of (node, depth)
    nodes = []
    nodes.append((tree_root, 0))

    while len(nodes):
        # Pop a node and its depth from the top of our stack
        node, depth = nodes.pop()

        # Case: we found a leaf
        if (not node.left) and (not node.right):
            # We only care if it's a new depth
            if depth not in depths:
                depths.append(depth)

                # Two ways we might now have an unbalanced tree:
                #   1) more than 2 different leaf depths
                #   2) 2 leaf depths that are more than 1 apart
                if ((len(depths) > 2) or
                        (len(depths) == 2 and abs(depths[0] - depths[1]) > 1)):
                    return False
        else:
            # Case: this isn't a leaf - keep stepping down
            if node.left:
                nodes.append((node.left, depth + 1))
            if node.right:
                nodes.append((node.right, depth + 1))

    return True

Complexity

$O(n)$ time and $O(n)$ space.

For time, the worst case is the tree is balanced and we have to iterate over all $n$ nodes to make sure.

For the space cost, we have two data structures to watch: depths and nodes.

depths will never hold more than three elements, so we can write that off as $O(1)$ .

Because we’re doing a depth first search, nodes will hold at most $d$ nodes where $d$ is the depth of the tree (the number of levels in the tree from the root node down to the lowest node). So we could say our space cost is $O(d)$ .

But we can also relate $d$ to $n$ . In a balanced tree, $d$ is $O(\log_{2}(n))$ . And the more unbalanced the tree gets, the closer $d$ gets to $n$ .

In the worst case, the tree is a straight line of right children from the root where every node in that line also has a left child. The traversal will walk down the line of right children, adding a new left child to nodes at each step. When the traversal hits the rightmost node, nodes will hold half of the $n$ total nodes in the tree. Half $n$ is $O(n)$ , so our worst case space cost is $O(n)$ .

What We Learned

This is an intro to some tree basics. If this is new to you, don't worry—it can take a few questions for this stuff to come together. We have a few more coming up.

Particular things to note:

Focus on depth-first ↴

Depth-first search (DFS) is a method for exploring a tree or graph. In a DFS, you go as deep as possible down one path before backing up and trying a different one.

Depth-first search is like walking through a corn maze. You explore one path, hit a dead end, and go back and try a different one.

Here's a how a DFS would traverse this tree, starting with the root:

We'd go down the first path we find until we hit a dead end:

Then we'd do the same thing again—go down a path until we hit a dead end:

And again:

And again:

Until we reach the end.

Depth-first search is often compared with breadth-first search.

Advantages:

Depth-first search on a binary tree generally requires less memory than breadth-first.

Depth-first search can be easily implemented with recursion.

Disadvantages

A DFS doesn't necessarily find the shortest path to a node, while breadth-first search does.

vs breadth-first ↴

Breadth-first search (BFS) is a method for exploring a tree or graph. In a BFS, you first explore all the nodes one step away, then all the nodes two steps away, etc.

Breadth-first search is like throwing a stone in the center of a pond. The nodes you explore "ripple out" from the starting point.

Here's a how a BFS would traverse this tree, starting with the root:

We'd visit all the immediate children (all the nodes that're one step away from our starting node):

Then we'd move on to all those nodes' children (all the nodes that're two steps away from our starting node):

And so on:

Until we reach the end.

Breadth-first search is often compared with depth-first search.

Advantages:

A BFS will find the shortest path between the starting point and any other reachable node. A depth-first search will not necessarily find the shortest path.

Disadvantages

A BFS on a binary tree generally requires more memory than a DFS.

traversal. You should be very comfortable with the differences between the two and the strengths and weaknesses of each.

You should also be very comfortable coding each of them up.

One tip: Remember that breadth-first uses a queue ↴

Quick reference

Worst Case

space $O(n)$

enqueue $O(1)$

dequeue $O(1)$

peek $O(1)$

A queue stores items in a first-in, first-out (FIFO) order.

Picture a queue like the line outside a busy restaurant. First come, first served.

Strengths:

Fast operations. All queue operations take $O(1)$ time.

Uses

Breadth-first search uses a queue to keep track of the nodes to visit next.

Printers use queues to manage jobs—jobs get printed in the order they're submitted.

Web servers use queues to manage requests—page requests get fulfilled in the order they're received.

Processes wait in the CPU scheduler's queue for their turn to run.

Implementation

Queues are easy to implement with linked lists:

To enqueue, insert at the tail of the linked list.

To dequeue, remove at the head of the linked list.

You could implement a queue with an array or dynamic array, but it would get kinda messy. Try drawing it out. You'll notice that you'd need to build out a "scoot over" or "re-center" operation that automatically fires when your queue items hit the bottom edge of the array.

	Worst Case
space	$O(n)$
enqueue	$O(1)$
dequeue	$O(1)$
peek	$O(1)$

	Worst Case
space	$O(n)$
push	$O(1)$
pop	$O(1)$
peek	$O(1)$

	Stack Push	Stack Pop
Linked Lists	insert at head	remove at head
Dynamic Arrays	append	remove last element

and depth-first uses a stack ↴

Quick reference

Worst Case

space $O(n)$

push $O(1)$

pop $O(1)$

peek $O(1)$

A stack stores items in a last-in, first-out (LIFO) order.

Picture a pile of dirty plates in your sink. As you add more plates, you bury the old ones further down. When you take a plate off the top to wash it, you're taking the last plate you put in. "Last in, first out."

Strengths:

Fast operations. All stack operations take $O(1)$ time.

Uses:

The call stack is a stack that tracks function calls in a program. When a function returns, which function do we "pop" back to? The last one that "pushed" a function call.

Depth-first search uses a stack (sometimes the call stack) to keep track of which nodes to visit next.

String parsing—stacks turn out to be useful for several types of string parsing.

Implementation

You can implement a stack with either a linked list or a dynamic array—they both work pretty well:

Stack Push Stack Pop

Linked Lists insert at head remove at head

Dynamic Arrays append remove last element

(could be the call stack or an actual stack object). That's not just a clue about implementation, it also helps with figuring out the differences in behavior. Those differences come from whether we visit nodes in the order we see them (first in, first out) or we visit the last-seen node first (last in, first out).

Do you have an answer?

Show answer

I have an answer Tell me more

Wanna review this one again later? Or do you feel like you got it all?

Mark as done Pin for review later

You're out of free questions.

Gotchas

Overview

What's stored in a stack frame?

The Space Cost of Stack Frames

Breakdown

Solution

Complexity

What We Learned

Quick reference

Strengths:

Uses

Implementation

Quick reference

Strengths:

Uses:

Implementation

Programming interview questions by company:

Programming interview questions by topic: